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Abstract 

An uncertain database is defined as a relational database in 
which primary keys need not be satisfied. A repair (or possi- 
ble world) of such database is obtained by selecting a maxi- 
mal number of tuples without ever selecting two distinct tu- 
ples with the same primary key value. For a Boolean query q, 
the decision problem CERTAINTY (q) takes as input an un- 
certain database db and asks whether q is satisfied by every 
repair of db. Our main focus is on acyclic Boolean conjunc- 
tive queries without self -join. Previous work [23 1 has intro- 
duced the notion of (directed) attack graph of such queries, 
and has proved that CERTAINTY (q) is first-order expressible 
if and only if the attack graph of q is acyclic. The current 
paper investigates the boundary between tractability and in- 
tractability of CERTAINTY(g). We first classify cycles in 
attack graphs as either weak or strong, and then prove among 
others the following. If the attack graph of a query q contains 
a strong cycle, then CERTAINTY (q) is coNP-complete. If 
the attack graph of q contains no strong cycle and every weak 
cycle of it is terminal (i.e., no edge leads from a vertex in the 
cycle to a vertex outside the cycle), then CERTAINTY (q) is 
in P. We then partially address the only remaining open case, 
i.e., when the attack graph contains some nonterminal cycle 
and no strong cycle. Finally, we establish a relationship be- 
tween the complexities of CERTAINTY (q) and evaluating q 
on probabilistic databases. 

1 Introduction 

Primary key violations are a natural way for modeling un- 
certainty in the relational model. If two distinct tuples have 
the same primary key value, then at least one of them must 
be mistaken, but we do not know which one. This represen- 
tation of uncertainty is also used in probabilistic databases, 
where each tuple is associated with a probability and distinct 
tuples with the same primary key value are disjoint probabilis- 
tic events |fl8l page 35]. 

In this paper, the term uncertain database is used for 
databases with primary key constraints that need not be sat- 
isfied. A repair (or possible world) of an uncertain database 
db is a maximal subset of db that satisfies all primary key 
constraints. Semantics of querying follows the conventional 
paradigm of consistent query answering J2J |4J: Given a 
Boolean query q, the decision problem CERTAINTY (q) takes 
as input an uncertain database db and asks whether q is satis- 
fied by every repair of db. Notice that q is not part of the input, 
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Figure 1: Uncertain database. 



so the complexity of the problem is data complexity. The re- 
striction to Boolean queries simplifies the technical treatment, 
but is not fundamental. 

Primary keys are underlined in the conference planning 
database of Fig. [T] Maximal sets of tuples that agree on 
their primary key, called blocks, are separated by dashed lines. 
There is uncertainty about the city of PODS 2016, and about 
the rank of KDD. The database has four repairs. The query 
3x3y(C(x,y, 'Rome') AR(x, 'A')) (Will Rome host some A 
conference?) is true in only three repairs. 

The problem CERTAINTY (q) is in coNP for first-order 
queries q (a "no" certificate is a repair falsifying q). Its com- 
plexity for conjunctive queries has attracted the attention of 
several authors, also outside the database community Q. A 
major research objective is to find an effective method that 
takes as input a conjunctive query q and decides to which 
complexity classes CERTAINTY(g) belongs, or does not be- 
long. Complexity classes of interest are the class of first-order 
expressible problems (or AC ), P, and coNP-complete. 

Unless specified otherwise, whenever we say "query" in the 
remainder of this paper, we mean a Boolean conjunctive query 
without self-join (i.e., without repeated relation names). Such 
queries are called acyclic if they have a join tree 0. 

Our previous work ET1 l23l has revealed the frontier 
between first-order expressibility and inexpressibility of 
CERTAINTY(g) for acyclic queries q. In the current work, 
we study the frontier between tractability and intractability 
of CERTAINTY (q) for the same class of queries. That is, 
we aim at an effective method that takes as input a query 
q and decides whether CERTAINTY(g) is in P or coNP- 
complete (or neither of the two, which is theoretically pos- 
sible if P^coNP IPT41 ). For queries with exactly two atoms, 
such a method was recently found by Kolaitis and Pema lfl3l . 
but moving from two to more than two atoms is a major chal- 
lenge. 

Uncertain databases become probabilistic by assuming that 
the probabilities of all repairs are equal and sum up to 1 . In 
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probabilistic terms, distinct tuples of the same block repre- 
sent disjoint (i.e., exclusive) events, while tuples of distinct 
blocks are independent. Such probabilistic databases have 
been called block-independent-disjoint (BID). The tracta- 
bility/intractability frontier of query evaluation on BID prob- 
abilistic databases has been revealed by Dalvi et al. [8 1. Here, 
evaluating a Boolean query is a function problem that takes 
as input a BID probabilistic database and asks the probabil- 
ity (a real number between and 1) that q is true. The de- 
cision problem CERTAI NTY(g), on the other hand, simply 
asks whether this probability is equal to 1. 

In previous work l23l . we introduced the (directed) attack 
graph of an acyclic query, and showed that CERTAI NTY(g) is 
first-order expressible if and only if q's attack graph is acyclic. 
In the current paper, we study attack graphs in more depth. We 
will classify cycles in attack graphs as either weak or strong. 
The main contributions can then be summarized as follows. 

1 . If the attack graph of an acyclic query q contains a strong 
cycle, then CERTAI NTY(g) is coNP-complete. This 
will be Theorem [2] 

2. If the attack graph of an acyclic query q contains no 
strong cycle and all weak cycles of it are terminal (i.e., 
no edge leads from a vertex in the cycle to a vertex out- 
side the cycle), then CERTAINTY(g) is in P. This will 
be Theorem [3] 

3. The only acyclic queries q not covered by the two pre- 
ceding results have an attack graph with some nonter- 
minal cycle and without strong cycle. We provide sup- 
porting evidence for our conjecture that CERTAI NTY(<7) 
is tractable for such queries. Our results imply that 
CERTAINTY (q) is tractable for "cycle" queries q of 
the form 3*(R\(x\,X2) A/?2(*2,*3) • • ■ ^Rk-i(xic-i, x k) 
ARi l (xi { ,xi)y These queries arise in the work of Fux- 
man and Miller 1 10 1. The case k = 2 was solved in [22], 
but the case k > 2 was open and will be settled by Corol- 
lary [T] 

4. Theorem [6] and its Corollary [2] will establish a 
relationship between the tractability frontiers of 
CERTAINTY(g) and query evaluation on probabilistic 
databases. 

Our work significantly extends and generalizes known results 
in the literature. 

The remainder of this paper is organized as follows. The 
next section further discusses related work. Section [3] de- 
fines the basic notions of certain conjunctive query answer- 
ing. Section |4] defines the notion of attack graph. Sections [5] 
and [6] show our main intractability and tractability results re- 
spectively. Section [7] establishes a relationship between the 
complexities of CERTAI NTY(g) and evaluating query q on 
probabilistic databases. Section [8] concludes the paper and 
raises challenges for future research. Several proofs have been 
moved to an Appendix. 



2 More Related Work 

The investigation of CERTAI NTY(g) was pioneered by Fux- 
man and Miller l9l fl0l . who defined a class of queries q for 
which CERTAI NTY(g) is first-order expressible. This class 
has later on been extended by Wijsen 12T1 1231 , who devel- 
oped an effective method to decide whether CERTAINTY^) 
is first-order expressible for acyclic queries q. In their 
conclusion, Fuxman and Miller |[9] [T0j| raised the question 
whether there exist queries q, without self-join, such that 
CERTAI NTY(g) is in P but not first-order expressible. The 
first example of such a query was identified by Wijsen El . 
The current paper identifies a large class of such queries (all 
acyclic queries with a cyclic attack graph in which all cycles 
are weak and terminal). 

Kolaitis and Pema fPJl recently showed that for every 
query q with exactly two atoms, CERTAI NTY(^) is either in 
P or coNP-complete, and it is decidable which of the two is 
the case. If CERTAI NTY(g) is in P and not first-order ex- 
pressible, then it can be reduced in polynomial time to the 
problem of finding maximal (with respect to cardinality) inde- 
pendent sets of vertices in claw-free graphs. The latter prob- 
lem can be solved in polynomial time by an ingenious algo- 
rithm of Minty [ 17 1. Unfortunately, the proposed reduction is 
not applicable on queries with more than two atoms. 

The counting variant of CERTAI NTY(g), which has 
been denoted l]CERTAIIMTY(g), takes as input an uncertain 
database db and asks to determine the number of repairs of 
db that satisfy query q. Maslowski and Wijsen lfl6l [T5 1 have 
recently showed that for every query q, the counting problem 
tjCERTAINTY(g) is either in FP or t]P-complete, and it is de- 
cidable which of the two is the case. 

As observed in Section [T] uncertain databases are a re- 
stricted case of block-independent-disjoint (BID) probabilis- 
tic databases J7] |8j. This observation will be elaborated in 
Section|7] 

All aforementioned results assume queries without self- 
join. For queries q with self-joins, only fragmentary results 
about the complexity of CERTAINTY^) are known Il6ll20l. 
The extension to unions of conjunctive queries has been stud- 
ied in na. 

3 Preliminaries 

We assume disjoint sets of variables and constants. If x is 
a sequence containing variables and constants, then vars(x) 
denotes the set of variables that occur in x, and |x| denotes the 
length of x. 

Let U be a set of variables. A valuation over U is a total 
mapping 8 from U to the set of constants. Such valuation 8 is 
extended to be the identity on constants and on variables not 
mU. 

Atoms and key-equal facts. Every relation name R has 
a fixed signature, which is a pair [n,k] with n > k > 1: the 
integer n is the arity of the relation name and {1,2, ... ,k} is 
the primary key. The relation name R is all-key \fn = k. lfR is 
a relation name with signature [n , k] , then R(s\ , . . . , s„) is an R- 
atom (or simply atom), where each Sj is either a constant or a 
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variable (1 <i< n). Such atom is commonly written as R(x,y) 
where the primary key value x = S\ , . . . ,5* is underlined and 
y = $k+i , ■ ■ ■ i s n- A. fact is an atom in which no variable occurs. 
Two facts R\ {a\ , b\ ) , i?2 (fl2, t>2) are key -equal if R\ = R2 and 

«1 = fl2- 

We will use letters F,G,H,I for atoms, and A,B,C for facts 
of an uncertain database. For atom F = R(x,y), we denote by 
key(F) the set of variables that occur in x, and by vars(F) the 
set of variables that occur in F, that is, key(F) = vars(x) and 
vars(F) = vars(x) U vars(y). 

Uncertain database, blocks, and repairs. A database 
schema is a finite set of relation names. All constructs that 
follow are defined relative to a fixed database schema. 

An uncertain database is a finite set db of facts using only 
the relation names of the schema. A block of db is a maximal 
set of key-equal facts of db. If A g db, then block(A,db) 
denotes the block of db containing A. An uncertain database 
db is consistent if it does not contain two distinct facts that are 
key-equal (i.e., if every block of db is a singleton). A repair 
of db is a maximal consistent subset of dbLj 

Boolean conjunctive query. A Boolean conjunctive query 
is a finite set q = {2?i(3?i,yi), R n (x n ,y n )} of atoms. By 
vars(q'), we denote the set of variables that occur in q. The set 
q represents the first-order sentence 

3ui . . 3u k (Ri (xi,yi) A • • • AR n (xn,y n )), 

where {«!,...,«&} = vars(g). The query q is satisfied by 
uncertain database db, denoted db |= q, if there exists a 
valuation 9 over vars(g) such that for each i G {l,...,n}, 
Rj(Q(xj),Q(yi)) G db. We say that q has a self-join if some 
relation name occurs more than once in q (i.e., if R, = Rj for 
some 1 < i < j < n). 

The restriction to Boolean queries simplifies the technical 
treatment, but is not fundamental. Since every relation name 
has a fixed signature, relevant primary key constraints are im- 
plicitly present in all queries; moreover, primary keys will be 
underlined. 

Join tree and acyclic conjunctive query. The notions of 
join tree and acyclicity 1 3 1 are recalled next. A join tree for 
a conjunctive query q is an undirected tree whose vertices are 
the atoms of q such that the following condition is satisfied: 

Connectedness Condition. Whenever the same vari- 
able x occurs in two atoms F and G, then x occurs 
in each atom on the unique path linking F and G. 

Commonly, an edge between atoms F and G is labeled by the 
(possibly empty) set vars(F) n vars(G). The term Connected- 
ness Condition appears in [11] and refers to the fact that the 
set of vertices in which x occurs induces a connected subtree. 
A conjunctive query q is acyclic if it has a join tree. The sym- 

L 

bol X will be used for join trees. We write F ^ G to denote an 



It makes no difference whether the word "maximal" refers to cardinality 
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edge between F and G with label L. A join tree is shown in 
Fig.[2](left). 

Certain query answering. Given a Boolean conjunctive 
query q, CERTAINTY(g) is (the complexity of) the following 
set. 

CERTAINTY (q) — {db | db is an uncertain database such 
that every repair of db satisfies q] 

CERTAINTY(g) is said to be first-order expressible if there 
exists a first-order sentence cp such that for every uncertain 
database db, db G CERTAINTY (q) if and only if db (= (p. The 
formula (p, if it exists, is called a certain first-order rewriting 
of q. 

Purified uncertain databases. Let q be a Boolean con- 
junctive query. An uncertain database db is said to be purified 
relative to q if for every fact A <E db, there exists a valuation 
9 over vars(g) such that A G Q(q) C db. Intuitively, every fact 
in a purified uncertain database is relevant for the query. This 
notion of purified database is new and illustrated next. 

Example 1 The uncertain database {R(a,b), S(b,a), S(b,c)} 
is not purified relative to query {R(x,y),S(y,x)} because it 
contains no /?-fact that "joins" with S(b,c). < 

The following lemma implies that in the study of tractabil- 
ity of CERTAINTY (q), we can assume without loss of gen- 
erality that uncertain databases are purified; this assumption 
will simplify the technical treatment. Notice that the query q 
in the lemma's statement is not required to be acyclic. 

Lemma 1 Let q be a Boolean conjunctive query. Let dbo be 
an uncertain database. It is possible to compute in polynomial 
time an uncertain database db that is purified relative to q 
such that 

db G CERTAINTY(g) db G CERTAINTY (q). 
4 Attack Graph 

The primary key of an atom F gives rise to a functional de- 
pendency among the variables that occur in F. For example, 
R(x,y,z,u) gives rise to {x,y} — > {x,y,z,u}, which will be ab- 
breviated as xy — > xyzu (and which is equivalent to xy — > zu). 
The set 9(_(q) defined next collects all functional dependen- 
cies that arise in atoms of q. 

Definition 1 Let q be a Boolean conjunctive query. We define 
'Kfq) as the following set of functional dependencies. 

2C(q) = {key(F) ~> vars(F) | F e q) < 

Concerning the following definition, recall from relational 
database theory [ 19 , page 387] that if E is a set of functional 
dependencies over a set U of attributes and X C U, then the 
attribute closure of X (with respect to E) is the set {A G U \ 
E|=X^A}. 

Definition 2 Let q be a Boolean conjunctive query. For every 
F G q, we define F + - q as the following set of variables. 

F+>* = {x G vars(^) | %fq \ {F}) \= key (F) -> x} < 
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R(u, a,x) = F 



{x} 



a,x) = F 



<S(y,x, z) = G 
{x,y} {x,z} 

/ \ 

T(x,y)=H P(x,z) = I 




G 



T(x,y) = H P(x,z) = I 



Figure 2: Join tree (left) and attack graph (right) of query q\. The attack from G to F is strong. All other attacks are weak. 



In words, F + - q is the attribute closure of the set key(F) with 
respect to the set of functional dependencies that arise in the 
atoms of q\ {F}. Note that variables play the role of attributes 
in our framework. 

Example 2 Let q\ — {R(u,a,x), S(y,x,z), T(x,y), P(x,z)}. 
A join tree for this query is shown in Fig. [2] (left). To shorten 
notation, let F = R(u,a,x), G = S(y,x,z), H = T(x,y), and 
I = P(x,z), as indicated in the figure. We have the following. 

^C(<7i \ {F}) = {y -> xyz,x -» xy,x xz} 
key(F) = {u} and F+>«i = {«} 

%i(qi \ {G}) — {u — > ux,x —> xy,x —> xz} 
key(G) = {y} and G+& = {y} 

%{q\ \ {H}) = {u — > ux,y — > xyz,x — > xz} 
key(H) = {x} and ff+>«i = {x,z} 

'Kiqx \ {I}) — {u — > ux,y —> xyz,x —> xy} 

key(/) = {x} and I+& = {x,y,z} < 

Definition 3 Let q be an acyclic Boolean conjunctive query. 
Let x be a join tree for q. The attack graph of 1 is a directed 
graph whose vertices are the atoms of q. There is a directed 
edge from F to G if F, G are distinct atoms such that for every 
label L on the unique path that links F and G in T, we have 
L £ F+'i. 

We write F G if the attack graph of T contains a directed 
edge from F to G. The directed edge F ~» G is also called an 
attack from F to G. If F G, we say that F attacks G (or that 
G is attacked by F). < 

Example 3 This is a continuation of Example [2] Fig. [2] (left) 
shows a join tree X] for query q\. The attack graph of X\ is 
shown in Fig.|2](right) and is computed as follows. 

Let us first compute the attacks outgoing from F. The path 

M 

from F to G in the join tree is F ^ G. Since the label {x} 
is not contained in F + ' 91 , the attack graph contains a directed 
edge from F to G, i.e., F ~A G. The path from F to // in 

W to} 

the join tree is F ^ G ^ //. Since no label on that path is 
contained in F + <?1 , the attack graph contains a directed edge 
from F to H. In the same way, one finds that F attacks /. 



Let us next compute the attacks outgoing from H. The path 

from H to G in the join tree is H ^ G. Since the label {x,y} 
is not contained in G + </1 , the attack graph contains a directed 
edge from H to G, .i.e., H G. The path from // to F in the 

to} {-4 

join tree is H ^ G ^ F. Since the label {jc} is contained in 
H + q ' t , the attack graph contains no directed edge from H to 
F. And so on. The complete attack graph is shown in Fig. [2] 
(right). < 

Remarkably, it was shown in [23 1 that if Xi and X2 are dis- 
tinct join trees for the same acyclic query q, then the attack 
graph of Xi is identical to the attack graph of X2. This moti- 
vates the following definition. 

Definition 4 Let q be an acyclic Boolean conjunctive query. 
The attack graph of q is the attack graph of x for any join tree 
X for q. We write F G (or simply F G if q is clear from 
the context) to indicate that the attack graph of q contains a 

directed edge from F to G. We write F 7A G if it is not the 
case that F ~+ G. < 

The attack graph of an acyclic query g can be computed in 
quadratic time in the length of q l23l . Figures [4] and [5] show 
attack graphs, but omit join trees. The main result in [23| is 
the following. 

Theorem 1 ([23 1) The following are equivalent for all 
acyclic Boolean conjunctive queries q without self-join: 

1. The attack graph of q is acyclic. 

2. CERTAINTY(g) is first-order expressible. 

Finally, we provide two lemmas that will be useful later on. 

Lemma 2 Let q be an acyclic Boolean conjunctive query. Let 
F, G be distinct atoms of q. IfF G, then key(G) £ F +,q and 
vars(F) $£ F + >«. 

Lemma 3 ([23 1) Let q be an acyclic Boolean conjunctive 
query. Let F,G,H be distinct atoms of q. If F G and 
G H, then F -w H or G F. 
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5 Intractability 

The following definition classifies cycles in attack graphs as 
either strong or weak. The main result of this section is 
that CERTAINTY(g) is coNP-complete for acyclic queries q 
whose attack graph contains a strong cycle. 

Definition 5 Let q be an acyclic Boolean conjunctive query. 
For every F e q, we define F 03,9 as the following set of vari- 
ables. 

F m - C < = {xe vars(^) | %{q) \= key(F) -> x} 

An attack F G in the attack graph of q is called weak if 
key(G) C /^ Ea ' < ?. An attack that is not weak, is called strong. 

A (directed) cycle of size n in the attack graph of q is a 
sequence of edges Fo Fi F2 . . . ~> F n - \ Fo such that 
/ 7^ jf implies F 7^ F,\ Thus, cyc/e means elementary cycle. 

A cycle in the attack graph of q is called strong if at least 
one attack in the cycle is strong. A cycle that is not strong, is 
called weak. < 

It is straightforward that F + ' q C F ffl ' 9 . 

Example 4 For the query q\ in Fig. [2] we have the following. 

^C(<7l) = { u ~ * ux ^y ~ * xyz,X —> xy,x — > xz} 

F H * = {«,x,y,z} 

G H '<" = {x,y lZ } 

H m ^ = {x,y,z} 

7 ffl '<» - {x,y,z} 

The attack F ~4 G is weak, because key(G) = {x} C F 53,91 . 
The attack G F is strong, because key(F) = {m} % G ES ' <?1 . 
One can verify that the attack from G to F is the only strong 
attack in the attack graph of q\ . 

The attack cycle G -A -w G is weak. The attack cycle 
F -24 G ~4 F is strong, because it contains the strong attack 
G •w F. For the same reason, the attack cycle F S> }j % Q -24 
F is strong. < 

Example |4] showed that the attack graph of q\ has a strong 
cycle of length 3, and a strong cycle of length 2. This is no 
coincidence, as stated by the following lemma. 

Lemma 4 Let q be an acyclic Boolean conjunctive query. If 
the attack graph of q contains a strong cycle, then it contains 
a strong cycle of length 2. 

The following proof establishes that for every acyclic query 
q whose attack graph contains a strong cycle, there exists a 
polynomial-time many-one reduction from CERTAINTY (qo) 
to CERTAINTY (g), where q = {Ro(&y),S (y 1 z ? x )}- Since 
CERTAINTY (go) was proved coNP-hard by Kolaitis and 
Pema lfl3l . we obtain the desired coNP-hard lower bound for 
CERTAINTY (q). As the proof is rather involved, we provide 
in Fig. |3]a mnemonic for the construction in the beginning of 
the proof. To further improve readability, some parts of the 
proof will be stated as sublemmas. 



vars(g) 




Figure 3: Help for the proof of Theorem [2] 



Theorem 2 Let q be an acyclic Boolean conjunctive query 
without self-join. If the attack graph of q contains a strong 
cycle, then CERTAINTY (q) is coNP-complete. 

Proof Since CERTAINTY^) is obviously in coNP, it suf- 
fices to show that it is coNP-hard. Assume that the attack 
graph of q contains a strong cycle. By Lemma |4] we can as- 
sume F, G € q such that F —* G F and the attack F G is 
strong. For every valuation 8 over {x,y,z}, we define 9 as the 
following valuation over vars(g). 

1. If u e F+' 9 l~l G + ' q , then 8(w) = 'c/' for some fixed con- 
stant d; 

2. if ueF+'i\G + 'i, then Q(u) =Q[x); 

3. if u G G + ' q \ F a ^, then 6(h) = (9(y), 9(z)); 

4. if we (G+' < ?nF a ^)\F+-'?, then9(» = 8(y); 

5. if u£F m 'i\(F + 'iUG + -'i), then 9(m) = (9(x), 9(y)>; and 

6. ifu<£F m - c i\JG+> cl , then9(«) = (9(x),9(y),9(z)). 

Notice that 9(m) can be a sequence of length two or three; 
two sequences of the same length are equal if they contain the 
same elements in the same order. The Venn diagram of Fig. [3] 
will come in handy: every region contains a boxed label that 
indicates how 9(h) is computed for variables u in that region. 
For example, assume u belongs to the region with label (x,y) 
(i.e., ueF a ^\(F + 'i(JG + 'i)), then 9(h) = (Q(x),Q(y)). 

We show three sublemmas that will be used later on in the 
proof. 

Sublemma 1 Let 9i,92 be two valuations over {x,y,z\. If 
H G q such that F 7^ H 7^ G, then {9i (H), 82(H)} is consis- 
tent. 
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Proof Sublemma [I] Let H e q such that F =^H ^=G. Assume 
the following. 

For every u £ key(H) , 9i (u) = d 2 (m) . ( 1 ) 

It suffices to show the following. 

For every u E vars(//),0i(«) = 9 2 (w). (2) 

We consider four cases. 

Case 9i(x) = Q 2 (x) and 8i(y) = 8 2 (y). If 0i(z) = 9 2 (z), 
then 9i = 9 2 , and (j2j) holds vacuously. Assume next 9i(z) 7^ 
9 2 (z). Then it follows from ([T]i that no variable of key(H) be- 
longs to a region of the Venn diagram (see Pig. [3l that contains 
Z. Since z occurs in all regions outside F m > q , we conclude 
key(H) C F a<? . Since ^C(^) contains key(#) vars(ff), it 
follows vars(//) C F ,? . Since z does not occur inside 
in the Venn diagram, we conclude (|2]). 

Case 9i(x) = 9 2 (x) and Qi(y) ^ 9 2 (y). By ([!}, no vari- 
able of key(H) belongs to a region of the Venn diagram 
that contains y. It follows key(H) C F +,<1 . Consequently, 
vars(//) C F + ' q . Since neither y nor z occurs inside F + ' 9 in 
the Venn diagram, we conclude (|2}. 

Case 9i (x) 7^ 9 2 (x) and 9i(y) = 9 2 (y). First assume 
9i(z) = 9 2 (z). By ([TJ, no variable of key(//) belongs to a 
region of the Venn diagram that contains x. Consequently, 
key(H) C G+> 9 . It follows vars(H) C G+' 9 . Since x does not 
occur inside G +,<? in the Venn diagram, we conclude $2\. 

Next assume 9i(z) 7^ 9 2 (z). By Q, no variable of key(H) 
belongs to a region of the Venn diagram that contains x or z. 
Consequently, key(H) C F EB < ? n G + <? . It follows vars(//) C 
F ffl, 9 n G +.q^ Since neither x nor z occurs inside F ffl >« n G + ' q 

in the Venn diagram, we conclude (|2]). 

Case 9i(jc) ^ 9 2 (x) and 81 (y) 7^ 8 2 (y). By ([T}, no variable 
of key(H) belongs to a region of the Venn diagram that 
contains x or y. Consequently, key(H) C F + ' c/ n G +,<? . It 
follows vars(//) C F + - 9 n G + ' ? . Since none of x, y, or z occurs 
inside F + - q n G +:<y in the Venn diagram, we conclude §2§. 
This concludes the proof of Sublemma[T] H 

Sublemma 2 Lef 9i,9 2 fee fwo valuations over {x,y,z}. 

7. 9i(F) and 02(F) are key-equal 9i(x) — 9 2 (x). 

2. 8i(F) = %(F) <^ Q l {x) = d 2 (x)andd l (y)=d 2 (y). 
Sublemma 3 Let 9i , 9 2 be two valuations over {x,y,z}. 

1. 9i(G) and 9 2 (G) are key-equal <^=> 9i (v) = 9 2 (y) and 
9 1 (z) = 9 2 (z). 

2. 9i(G) = 9 2 (G) <^> 9i =9 2 . 

We continue the proof of Theorem [5] Let qo = {Ro(x,y), 
So(y,z,x)}. The signatures of Rq and 5q are [2,1] and [3,2] 



respectively. Let F) = Ro(x,y) and Go = So(y,z,x). In the re- 
mainder of the proof, we establish a polynomial-time many- 
one reduction from CERTAINTY^) to CERTAINTY^). 
coNP-hardness of CERTAINTY (q) then follows from coNP- 
hardness of CERTAINTY (qo), which was established in ifTTI . 

Let dbo be an uncertain database. By Lemma [T] we can 
assume that dbo is purified relative to go- Let 1/ be the set of 
valuations 9 over {x,y,z} such that Q(qo) Q dbo- Since dbo is 
purified, the following holds. 

db = Wo) |8g 1/}U{9(G ) |8g V) 

Let db = {B(H) \ H G q,B G V}. Since V can be computed 
in polynomial time in the size of dbo, the reduction from dbo 
to db is in polynomial time. Since q contains no self-join, the 
set db is partitioned by the three disjoint subsets defined next. 

db F = {9(F) | 9 G 1/} 
db G = {9(G) I 9 G V} 
db rest = {6(H) \H £q,F^H^G,Q£ V} 

Since db rest is consistent by Sublemma [TJ every repair of db 
is the disjoint union of dbrest, a repair of db/r, and a repair of 
dbc- In the next step of the proof, we establish a one-to-one 
relationship between repairs of dbo and repairs of db. 

The function map will map repairs of dbo to repairs of db. 
For every repair ro of dbo, ma P( r o) is the disjoint union of 
three sets, as follows. 

map(ro) = {9(F) | 9(F ) G r , 9 G 1/} 
U {9(G) |9(G )er ,9e V} 

U dbrest 

Clearly, the first of these three sets is contained in dbf , and 
the second in dbc- By Sublemmas|2]and[3] for every 9 G 1/, 

9(F )er 9(F)emap(r ) (3) 

9(G )er <^> 9(G)emap(r ) (4) 

To prove the <= -direction of Q (the other implications are 
straightforward), assume A G map(ro) with A = 9(F). By the 
definition of map, we can assume 9' G 1/ such that Q'(Fq) G ro 
and 9'(F) = A. From 9(F) = 9'(F), it follows by Sublemma^] 
that 9(F ) = 9'(F ), hence 9(F ) G r . 

The following sublemma states that map is a bijection from 
the set of repairs of dbo to the set of repairs of db. 

Sublemma 4 1. If ro is a repair o/dbo, then map(ro) is a 
repair of Ah. 

2. For every repair r of db, there exists a repair ro of dbo 
such that r = map(ro). 

3. If ro,r are distinct repairs of dbo, then map(ro) 7^ 
map(r ). 

To conclude the proof of Theorem|2j we show: 
db € CERTAINTY (go) db G CERTAINTY^). 
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By Sublemma|4j it is sufficient to prove that for every repair 
r of db , 

ro h<?o map(r ) \= q. 

| =>■ | Assume ro |= qo- We can assume G V such that 
0(go) Q r o- Obviously, 0(g) C map(ro). 

■<== Assume map(ro) |= q. We can assume a valuation 
fj over vars(g) such that p(q) C map(ro). 

Let x be a join tree for q. Let Hq ^ H\ . . . be the unique 
path in x between F and G, where Hq=F and = G. For / G 
{0, . . . , £}, we can assume 0; G V such that ju (//,■) = 0,(H,) G 
map(ro). Let i G {0, ...,£— 1}. We show 8,-(x) = 0,+i(x) 
and 0, (y) = 0,+i (y). Since F ~> G ~-> F, the label L, contains 
a variable such that w; ^ and a variable wi such that 
w,- ^ G +,,? (possibly m,- = w,). 

Since G vars(//,) n vars(// 1+ i), it must be the case that 

0(( M i) = M 1 *;) = 0!+i( M i)- Since y occurs in every region out- 
side F + > q in the Venn diagram (Fig. [3} and m, ^ it is 
correct to conclude 0, (y) = 0,+i (y). 

Likewise, since w,- G vars(//,) n vars(// 1+ i), it must be the 
case that 0,(w ( ) = ^(w,-) = !+ i(w,). Since x occurs in every 
region outside G +,q in the Venn diagram and wi $ G + <? , it is 
correct to conclude 0;(x) = 8;+i (x). 

Consequently, 0o(x) = 0^(x) and 0o(y) = ®e(y)- From 
e (Ho),Qe(Hi) G map(r ), H = F, and H f = G, it follows 
0oCFb),0(?(Go) G r by ^ and Since O and Q e agree 
on each variable in vars(Fo) n vars(Go) = {x,y}, it follows 
ro |= <?o- This concludes the proof of Theorem|2] □ 

6 Tractability 

We conjecture that if the attack graph of an acyclic query q 
contains no strong cycle, then CERTAIIMTY(g) is in P. 

Conjecture 1 Let q be an acyclic Boolean conjunctive query 
without self-join. If all cycles in the attack graph of q are 
weak, then CERTAINTY (q) is in P. 

Notice that by Theorem[T[ we know that Conjecture[T]holds 
in the special case where q's, attack graph contains no cycle at 
all. Theorem [2] and Conjecture [T] together imply that for ev- 
ery acyclic query q, CERTAINTY (q) is either in P or coNP- 
complete. In the following section, a somewhat weaker ver- 
sion of Conjecture[T]is proved. 

6.1 All Cycles are Weak and Terminal 

We show a weaker version of Conjecture [T] In this weaker 
version, the premise "all cycles are weak" is strengthened into 
"all cycles are weak and terminal." 

Definition 6 A cycle in a directed graph is called terminal if 
the graph contains no directed edge from a vertex in the cycle 
to a vertex outside the cycle. A cycle is nonterminal if it is 
not terminal. < 

Example 5 Figure |4] shows the attack graph of the 
acyclic query {Ri(x,u\,U2,z), R2(x,U2,ui,z), Ri(x,y,U3,U4), 




R 2 ( x,u 2 ,u 1 ,z) R 6 ( y,u 6 ,u 5 ) 

Figure 4: Attack graph. All cycles are weak and terminal. 

i?4 (x, y, «4 , M3 ) , /?5 (y, U5 , u§), R(, (y, «6 , «5 ) } ■ All attack cycles 
are terminal and weak. < 

Example 6 In the attack graph of Fig. [5] all cycles are weak, 
but no cycle is terminal. < 

Theorem 3 Let q be an acyclic Boolean conjunctive query 
without self-join. If all cycles in the attack graph of q are 
weak and terminal, then CERTAINTY (q) is in P. 

Notice that if a query q has exactly two atoms, then q is 
acyclic and every cycle in q's attack graph must be terminal. 
Therefore Theorems [2] and [3] together imply the dichotomy 
theorem of Kolaitis and Pema lfT3l . 

To prove Theorem[3] we need four helping lemmas. In sim- 
ple words, the first lemma states that if we replace a variable 
with a constant in an acyclic query, then no new attacks are 
generated, and weak attacks cannot become strong. 

Definition 7 Let q be a Boolean conjunctive query. If 
x = (xi,...,X() is a sequence of distinct variables and a = 
(ai,...,ae) a sequence of constants, then gp^j denotes the 
query obtained from q by replacing each occurrence of x, with 
aj, for all i € {!,...,£}. If is a valuation, then Spj^ is 
the valuation such that 8p^#] (x) = a and 0p^2](y) = 0(y) if 
y^vars(x). < 

Lemma 5 Let q be an acyclic Boolean conjunctive query 
without self-join. Let F,G G q. Let z G vars(g) and let c be a 
constant. Let q' — q[ z ^ c ]> F' — F[ z ^ c ]> an d G' = G[ Z i_> c ]. Then, 
the following hold. 

1. q' is acyclic. 

2. IfF' X G', then F X G 

3. IfF' G' and F G is a weak attack, then F' ~-> G' is 
a weak attack. 

Lemma 6 Let q be an acyclic Boolean conjunctive query. If 
each cycle in the attack graph of q is terminal, then each cycle 
in the attack graph has length 2. 

Lemma 7 Let q be an acyclic Boolean conjunctive query 
such that each cycle of the attack graph of q is terminal and 
each atom of q belongs to a cycle of the attack graph. 
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1. If the same variable x occurs in two distinct cycles of 
the attack graph, then for each atom F in these cycles, 
x 6 key(F). 

2. IfF ~-> G is a weak attack, then key(G) C vars(F). 

The following lemma applies to queries with an atom 
whose primary key contains no variables. 

Lemma 8 Let q be a Boolean conjunctive query without self- 
join. Let F £q such that key (F) = 0. Let q' = q\{F}. Lety be 
a sequence of distinct variables such that vars(y) = vars(F). 
Let db be an uncertain database that is purified relative to q, 
and let D be the active domain of Ah. Then the following are 
equivalent: 

1. db<G CERTAINTY^). 

2. db 7^ and for all b £ iffl, if fjy^g] € db, then db £ 
CERTAINTY^]). 

The proof of Theorem[3]can now be given. 

Proof Theorem [3] Given uncertain database db, we need 
to show that it can be decided in polynomial time (in the size 
of db) whether db € CERTAINTY^). Let D be the active 
domain of db. By Lemma[T| we can assume that db is purified 
relative to q. 

The proof runs by induction on the length of q. For the base 
of the induction, we consider the case where the attack graph 
of q contains no unattacked atom (i.e., no atom has zero inde- 
gree). CERTAINTY(g) is obviously in P if q = {}. Assume 
next that q is nonempty. 

Since all cycles of g's attack graph are terminal and every 
atom has an incoming attack, every atom of q belongs to some 
cycle of the attack graph. By Lemma|6] the attack graph of q is 
a set of disjoint weak cycles F\ ~+ G\ F\, . . ., Fg Gg 
Fi for some I > 1. For i £ {1, . . . ,£}, let qi = {/v,G,}, and 
let Xi be a sequence of distinct variables that contains every 
variable x £ vars(<7,) such that for some /' ^ i, x € vars(g 7 ). 
By Lemma|7] vars(x,) C key(f)) n key(G,). 

For i £ {1, ...,£}, let db, be the subset of db containing 
every fact A with the same relation name as Fj or G,. Call a 
partition of db, a maximal subset P of db, such that for some 
a £ Z)l x 'l, for all A £ P, there exists a valuation such that 
A = 0p.^3](7^) or A = 0p ih ^3](G,). The sequence a is called 
the vector of partition P. 

In words, each partition of db,- groups facts that can be 
obtained from Fi or G, by replacing the variables of x\ with 
the same fixed constants. For example, the attack graph in 
Fig. |4] contains an attack cycle involving R^x.y, u^.ua,) and 
R-4(x,y, U4, M3). The sequence (x,y) contains the variables 
that also occur in other cycles. The facts Rj(a,b,c,d) and 
R4(a,b,e,f) both belong to the partition with vector (a,b). 

Clearly, two facts that belong to distinct partitions of db, 
cannot be key-equal. It follows that each repair of db, is a 
disjoint union of repairs, one for each partition of db, . 

Let [[db,JJ be the smallest subset of db,- that contains 
every partition P satisfying P £ CERTAINTY^,-). By 



Lemma [J] and flT3] Theorem 2], CERTAINTY^,) is in P 
for 1 < i < i. From the following sublemma, it follows that 
CERTAINTY(^) is in P. 

Sublemma 5 The following are equivalent: 

1. db £ CERTAINTY (g). 

2. Ui<K^db,| \=q. 

For the step of the induction, assume that F is an unattacked 
atom in q's, attack graph. Let x be a sequence of distinct vari- 
ables such that vars(x) = key(F). By Corollary 8.11 in 11231 . 
the following are equivalent. 

1. db £ CERTAINTY (g). 

2. For some a £ D®, db £ CERTAINTY (9^). 

Let y be a sequence of distinct variables such that vars(y) = 
vars(7 7 ) \ key(F). Let q' ~q\ {F}. By Lemma[T[ it is possible 
to compute in polynomial time a database db' that is purified 
relative to q such that 

dbe CERTAINTY (?pg_^) <^ db' £ CERTAINTY(^^ 5] ). 
By Lemma [8] the following are equivalent: 

1. db'eCERTAINTY^p^j). 

2. db' ^ and for all b £ D®, if F^_^ £ db', then db' £ 
CERTAINTY^'^). 

By Lemma p| all cycles in the attack graph of q'^.^^ 
are weak ana terminal. By the induction hypothesis, it 
follows that CERTAINTY^'j-^gj) is in P. Since the 

sizes of D' x ' and are polynomially bounded in the size 
of db, it is correct to conclude that CERTAINTY (q) is in P. □ 

6.2 Nonterminal Weak Cycles 

Theorems [2] and [3] leave open the complexity of 
CERTAINTY(g) when the attack graph of q contains 
one or more nonterminal weak cycles and no strong cycle. In 
this section, we zoom in on acyclic queries AC (A;), defined 
next for k £ {2,3, ... }, whose attack graph contains k ^ k 2 ^ 
nonterminal weak cycles and no strong cycle. By showing 
tractability of CERTAINTY(AC(£)), we obtain more sup- 
porting evidence for Conjecture [T] As a side result, we will 
solve a complexity issue raised by Fuxman and Miller 1 10 1. 

Definition 8 For k > 2, let C(k) and AC(k) denote the follow- 
ing Boolean conjunctive queries without self-join. 

C{k) = {R 1 (x 1 ,x 2 ), R 2 {x2,x 3 ), ...,R k -i(x k _ u x k ), 
R k (xk,x\)}, 

AC(k) = {Ri(xi_,X2),R2(x2,X3), . . . ,Rk-i(x k - U x k ), 
R k (x k ,x l ),S k ( x l ,x 2 ,. ■ ■ ,x k )}, 

where x\,...,x k are distinct variables and R\,... ,R k ,S k dis- 
tinct relation names. For i £ {l,...,k}, relation name is of 
signature [2, 1], and S k is of signature [k,k]. < 
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Figure 5: Attack graph of AC (3). All cycles are weak and 
nonterminal 
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Figure 6: At the left: uncertain database that is purified 
relative to AC (3). At the right: graph representation of 
R\ UR-2 UR3, Note that the three cycles encoded in S3 are 
clockwise. 

Obviously, a query q is acyclic if it contains an atom F such 
that vars(F) = vars(g). Therefore, AC(k) is acyclic because 
the Sjt-atom contains all variables that occur in the query. On 
the other hand, C(k) is acyclic if k = 2 and cyclic if k > 3. 

For ig {l,...,k}, the attack graph of AC(k) contains at- 
tacks from the /?,-atom to every other atom. Figure [5] shows 
the attack graph of AC (3). All attack cycles are weak, but 
Theorem [3] does not apply because the cycles are nontermi- 
nal. 

CERTAINTY(C(/t)) was claimed coNP-hard for all k > 2 
in ifTOl . Later, however, Wijsen E2l found a mistake in the 
proof of that claim and showed that CERTAINTY(C(A:)) is 
tractable if k = 2. The complexity of CERTAINTY(C(/t)) for 
k > 3 will be settled by Corollary [T] 

Theorem 4 Fork > 2, CERTAIIMTY(AC(£)) is in P. 

Proof {Extended sketch.) Let db be an uncertain database 
with schema {Ri,. . . , Rk, 5^}. By Lemma[T[ we can assume 
without loss of generality that db is purified relative to AC (A:). 
Let D be the active domain of db. For every i G {1, . . .,k}, de- 
fine type(.x, ) as the subset of D that contains a if for some val- 
uation /j, /j[ v .^ a ](AC(A:)) C db. Since AC(k) has no self-join, 
we can assume without loss of generality that i ^ j implies 
type(jc;) ntype(x ; ) = 0. 



V b 1 




Figure 7: Graph representation of two repairs (of the uncer- 
tain database of Fig. |6| that falsify AC (3). The left cycle is 
anticlockwise and not encoded in S3. 

For example, assume Ri(a,b),Rj(c,d) € db with i < j. 
Since a G type(x,), b G type(x, + i), and c G type(xj), it fol- 
lows that by^ a c and that b = c implies j = i + 1. 

The ^,-facts of db can be viewed as edges of a directed 
graph (1 < i < k). This is illustrated in Fig. [6] for k = 3. 
Let G = (V,E) be the directed graph such that V = D and 
E = {(a,b) I Rj(a,b) G db for some /}. Then, G is fc-partite 
with vertex classes type(xi), type(x^). Furthermore, 
whenever (a,b) G E and a G type(x,), then b G type(x, + i) if 
i < k (and b G type(xi) if i = k). It follows that the length of 
every elementary cycle in G must be a multiple of k. Since db 
is purified, no vertex has zero outdegree. We define C as the 
set of cycles of length k such that if db contains Sk(a\ , . . . , 
then C contains the cycle a\,a2,.. .,a^,a\. 

Since db is purified, G is a vertex-disjoint union of strong 
components Si,..., Si (for some £ > 0) such that for i 7^ j, no 
edge leads from a vertex in S, to a vertex in Syr] 

In what follows, some vertices and edges of G will be 
marked. It is straightforward that db £ CERTAINTY (AC (A)) 
is equivalent to the following. 

It is possible to mark exactly one outgoing edge for 
each vertex of G without marking all edges of some (5) 
cycle in C- 

We provide a polynomial-time algorithm for testing con- 
dition (B). Marking one outgoing edge for each vertex will 
create a cycle of marked edges in each strong component. 

For each strong component S,-, consider the following cases 
successively and execute the first one that applies. 

Case Si contains a cycle of length k that does not belong 

to C. Such a cycle is illustrated by Fig. [7] (left). Mark all 
vertices and edges of the cycle. Notice that the number of 
cycles of length k is at most \V\ k , which is polynomial in the 
size of db. 

Case Si contains an elementary cycle of length (strictly) 
greater than k. Such a cycle is illustrated by Fig. [7] (right). 
Mark all vertices and edges of the cycle. To see that this step 
is in polynomial time, notice that the following are equivalent: 

• Sj contains an elementary cycle of length greater than k. 

2 A strong component of a graph G is a maximal strongly connected sub- 
graph of G. A graph is strongly connected if there is a path from any vertex 
to any other. 
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• Si contains a path a\ , a<ii cl%-\-\ such that a \ ^ a^+i 
and Si contains a path from a^+i to a\ that contains no 
edge from \ct\ ,02, ■ ■ ■ ,fljt} x V. 

The latter condition can be tested in polynomial time, because 
there are at most |V| i+1 distinct choices for a\ ,02, ... ,ak,ak+i 
and paths can be found in polynomial time. 

Case neither of the above two cases applies. Conclude 
that ((5J is false. 

If after the previous step every strong component contains 
a cycle of marked edges, then it is correct to conclude that Q 
is true. Notice that every cycle of C now contains at least one 
unmarked edge. We can achieve |5]l by marking, for each yet 
unmarked vertex, the vertices and edges on a shortest path to 
some marked vertex. This can be done without creating new 
cycles of marked edges. □ 

Since query C(k) is acyclic if k > 3, attack graphs are not 
defined for C(k) if k > 3. Nevertheless, the following lemma 
immediately implies that if CERTAINTY(AC(£)) is tractable, 
then so is CERTAINTY(C(fc)). 

Lemma 9 Let q be a Boolean conjunctive query without self- 
join. If q' C q and every atom in q\q' is all-key, then there 
exists an AC many-one reduction from CERTAINTY^') to 
CERTAINTY (g). 

Corollary 1 For k > 2, CERTAINTY(C(Ar)) is in P. 

Unsurprisingly, there exist acyclic queries q $ {AC(k) \ k > 
2} whose attack graph contains some nonterminal cycle and 
no strong cycle. The complexity of CERTAINTY(g) for such 
queries q is open. 

7 Uncertainty and Probability 

In this section, we study the relationship between the com- 
plexities of CERTAINTY (q) and evaluating q on probabilis- 
tic databases. The motivation is that, on input of an uncertain 
database db, the problem CERTAINTY(g) is solved if we can 
determine whether query q evaluates to probability 1 on the 
probabilistic database obtained from db by assuming a uni- 
form probability distribution over the set of repairs of db. We 
show, however, that this approach provides no new insights in 
the tractability frontier of CERTAINTY (q). 

7.1 Background from Probabilistic Databases 

In this section, we review an important result from probabilis- 
tic database theory. 

Definition 9 A possible world w of uncertain database db is 
a consistent subset of db. The set of possible worlds of db 
is denoted worlds(db). Notice that possible worlds, unlike 
repairs, need not be maximal consistent. 

A probabilistic database is a pair (db, Pr) where db is an 
uncertain database and Pr : worlds(db) — > [0, 1] is a total func- 
tion such that LwGworids(db) Pr(w) = 1. We will assume that 
the numbers in the codomain of Pr are rational. < 



The following definition extends the function Pr to Boolean 
first-order queries q. 

Definition 10 Let (db, Pr) be a probabilistic database. Let q 
be a Boolean first-order query. We define 

Pr(q) = £ Pr(w). 

wGworlds(db) :w\=q 

In words, Pr(q) sums up the probabilities of the possible 
worlds that satisfy q. < 



Of special interest is the application of Definition 10 
in case q is a single fact, or a Boolean combination of 
facts. Notice that if (db, Pr) is a probabilistic database and 
A\, . . . ,A n are distinct facts belonging to a same block of db, 
then Pr(A[ VA 2 V ■ ■ • VA„) = £* =1 P r 04;')> because no possi- 
ble world can contain two distinct facts that belong to a same 
block. 

Definition 11 Probabilistic database (db, Pr) is called block- 
independent-disjoint (BID) if the following holds: whenever 
Ai,...,A„ are facts of db taken from n distinct blocks (for 
some n > 1), then Pr(A! AA 2 A ■ ■ ■ AA„) = n"=i Pr ( A i)- < 

Theorem 2.4 in [ 8 1 implies that every BID probabilistic 
database (db, Pr) is uniquely determined if Pr(A) is given 
for every fact A £ db. This allows for an efficient encoding: 
rather than specifying Pr(w) for every w £ worlds(db), it suf- 
fices to specify Pr(A) for every A £ db. In the complexity 
results that follow, this efficient encoding is assumed. 

Notice that we can turn an uncertain database db into a BID 
probabilistic database by assuming that the probabilities of all 
repairs are equal and sum up to 1 . A consistent subset of db 
that is not maximal, would then have zero probability. 

Function IsSafe(g) Determine whether q is safe 
Input: q is a Boolean conjunctive query without self-join. 
Result: Boolean in {true, false}, 
begin 

R1 : if \q\ = 1 and vars(g) = then 
| return true; 

R2: if q = q\ U q2 with q\^%^ qi, vars(gi) fl vars(^) = 

then 

|^ return IsSafe q i ) A IsSafe q2)\ 

I* a is an arbitrary constant */ 
R3: if flFe 9 key(F) ^ then 
select x 6 ClFeg key(^); 
return , 



IsSafe 



R4: if there exists F 6 q such that key(F) = 7^ vars(F) then 
select F 6 q such that key(F) = 7^ vars(F); select 
x £ varsOF); 
return , 



IsSafe 



if none of the above then 
L return false; 
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Definition 12 Given a Boolean query q, PROBABILITY^) 
is the following function problem: on input of a BID proba- 
bilistic database (db, Pr), determine the value of Pr(q). 

A Boolean conjunctive query q, without self-join, is called 
safe if Algorithm IsSafe returns true. 



< 



The following result establishes a dichotomy in the com- 
plexity of PROBABILITY (#). 

Theorem 5 (|8|) Let q be a Boolean conjunctive query with- 
out self-join. 

1. Ifq is safe, then PROBABILITY (q) is in FP. 

2. Ifq is not safe, then PROBABILITY^) is [\P-hard. 
7.2 Comparing Complexities 

The following proposition establishes a straightforward re- 
lationship between the problems PROBABILITY (q) and 
CERTAINTY (q). The only subtlety is that a repair contains 
a fact of each block, while a possible world and a block may 
have an empty intersection (recall that possible worlds, unlike 
repairs, need not be maximal consistent). In the statement of 
this proposition, db' restricts db to the set of blocks whose 
probabilities sum up to 1 . 



can be easily seen that db € CERTAINTY (q) if and only if for 
some constant a, db € CERTAINTY(q , ^ a j). By the induc- 
tion hypothesis, CERTAINTY(g r [_ t ^ a ]) is first-order express- 
ible. Let cp be a certain first-order rewriting of qu^A, where 
we assume without loss of generality that c is a constant that 
does not occur in q. Let q>(x) be the first order formula ob- 
tained from 9 by replacing each occurrence of c with x. Then, 
3xty(x) of a certain first-order rewriting of q. 

Case R4 applies. Assume F € q such that key(F) = 
and vars(F) ^ 0. Let x be a sequence of distinct variables 
such that vars(x) = vars(F). Let a — (a, a,..., a) be a 
sequence of length \x\. By definition of safety, qm-*.^ is 
safe. By the induction hypothesis, CERTAINTY^p^j) is 
first-order expressible. From Lemma 8.6 in [23], it follows 
that CERTAINTY (q) is first-order expressible □ 



Corollary 2 Let q be a Boolean conjunctive query without 
self-join. If CERTAINTY(g) is not first-order expressible, 
then the function problem PROBABILITY(g) is [\P-hard. 



Proposition 1 Let (db, Pr) be a BID probabilistic database. 
Let db' be the smallest subset of db that contains every block 
b o/db such that Y*Aeb Pr(A) = 1. Let q be a Boolean con- 
junctive query. Then the following are equivalent: 

1. db' G CERTAINTY (q). 

2. On input (db, Pr), the answer 
PROBABILITY^) is 1. 

The following theorem establishes a nontrivial relation- 
ship between the complexities of CERTAINTY (q) and 
PROBABILITY (q). Notice that the query q in the theorem's 
statement is not required to be acyclic. 

Theorem 6 Let q be a Boolean conjunctive query without 
self-join. If q is safe, then CERTAINTY(g) is first-order ex- 
pressible. 

Proof The proof runs by induction on the execution of Algo- 



For acyclic queries, the only complexities of 
CERTAINTY(g) left open by Theorems [I] [2] and [5] concern 
queries q with a cyclic attack graph (in particular, an attack 
graph without strong cycle and with at least one nonterminal 
weak cycle). For such a query q, CERTAINTY(g) is not first- 
order expressible (by Theorem[T|), hence PROBABILITY (q) 
is t]P-hard (by Corollary [2]). Consequently, the probabilistic 
database approach fails to provide further insight into the 
tractability frontier of CERTAINTY (q). It turns out that the 
queries q for which PROBABILITY(g) is tractable is a very 
to the function problem restricted subset of the queries for which CERTAINTY (q) is 
tractable (assuming FP ^ t|P and P ^ coNP). 



rithm IsSafe Since q is safe, some rule of IsSafe applies to 
q. 

Case R1 applies. If q consists of a single fact, then 
CERTAINTY (q) is obviously first-order expressible. 

Case R2 applies. Let q — q\\Jq 2 with q\ ^ ^ q 2 and 
vars(gi ) n vars(^2) = 0- Since q is safe, q\ and q 2 are safe by 
definition of safety. By the induction hypothesis, there exists 
a certain first-order rewriting (pi of q\, and a certain first-order 
rewriting (p2 of qi. Obviously, (pi A 92 is a certain first-order 
rewriting of q. 

Case R3 applies. Assume variable x such that for every 
F € q, x € key(F). By definition of safety, q\ x ^ c i is safe. It 



8 Discussion 

In the following, we say that a class fP of function problems 
exhibits an effective FP-t]P- dichotomy if all problems in fP are 
either in FP or t]P-hard and it is decidable whether a given 
problem in fP is in FP or t]P-hard. Likewise, we say that a 
class fP of decision problems exhibits an effective P-coNP- 
dichotomy if all problems in fP are either in P or coNP-hard 
and it is decidable whether a given problem in fP is in P or 
coNP-hard. 

Recall from Section [I] that ^CERTAINTY (g) is the count- 
ing variant of CERTAINTY(g), which takes as input an un- 
certain database db and asks how many repairs of db sat- 
isfy query q. For the probabilistic and counting variants of 
CERTAINTY(g), the following dichotomies have been estab- 
lished. 

Theorem 7 (|8|,|15|) The following classes exhibit an effec- 
tive FP-{\P-dichotomy: 

1. the class containing PROBABILITY (q) for all Boolean 
conjunctive queries q without self-join; and 

2. the class containing t)CERTAINTY(g) for all Boolean 
conjunctive queries q without self-join. 
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Theorem [2] and Conjecture [T] imply the following conjec- 
ture, which is thus weaker than Conjecture[T] 

Conjecture 2 The class containing CERTAINTY^) for all 

acyclic Boolean conjunctive queries q without self-join ex- 
hibits an effective V-cciNV-dichotomy. 

From Theorems [2] and [3] it follows that in order to 
prove Conjecture [2] it suffices to show that an effec- 
tive P-coNP-dichotomy is exhibited by the class containing 
CERTAI NTY(g) for all queries q whose attack graph contains 
some nonterminal cycle and no strong cycle. 

We confidently believe that the P-coNP-dichotomy of Con- 
jecture [2] (if true) will be harder to prove than the FP-t]P- 
dichotomies established by Theorem [7J for the following 
reasons. All problems PROBABILITY^) that are in FP 
can be solved by a single, fairly simple polynomial-time 
algorithm which appears in [8|. Likewise, all problems 
t]CERTAINTY(g) in FP can be solved by a single, fairly 
simple polynomial-time algorithm |15|. On the other hand, 
CERTAI NTY(g) problems in P seem to ask for sophisti- 
cated polynomial-time algorithms. In their proof that Con- 
jecture [2] holds for queries with exactly two atoms, Kolaitis 
and Pema lfl3l made use of an ingenious polynomial-time al- 
gorithm of Minty ifTTl . Our proof of Theorem |4] uses algo- 
rithms from (directed) graph theory. Despite their sophistica- 
tion, these polynomial-time algorithms only solve restricted 
cases of CERTAINTY (q). 

Notice also that by Corollary [2] and Theorem [T] the func- 
tion problem PROBABILITY(g) is intractable for all acyclic 
queries q with a cyclic attack graph. On the other hand, cycles 
in attack graphs are exactly what makes Conjecture [2] hard to 
prove. 
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A Proofs of Section [3] 

Proof Lemma[T] In polynomial time, we can construct a max- 
imal sequence dbo — [ -> dbi — db2 • • • — ^> db„ such that for 
»G{1,2,. ..,»}, 

1. A,- g db/_i; 

2. there exists no valuation 8 such that A; G 8(g) C db, 1; 
and 

3. db, = db,_! \ block(Af,db/_i). 

Clearly, db„ is purified relative to q. We show that for i G 
{1,2,. ..,«}, 



H 2 ~+Hi is a weak attack, ^(9) |= key(H 2 ) -> key (Hi). By 
transitivity, ^C(g) |= key(Ha) — > key (Hi), a contradiction. 
This concludes the proof. □ 



Proof Sublemma 2 1 . 



db, 1 e CERTAINTY (#) <^ db, € CERTAINTY(g). ke y-equal 



Obviously, key(F) C F + >* 

From G F and Lemma|2| it follows key(F) ^ G + ' 9 . Thus, 
we can assume u G key(F)such that u g F + ' 9 \ G +,? . Hence, 
9i(m) = 9i(x) and B 2 {u) = B 2 (x). Since 61(F) and 82(F) are 
key-equal by the premise, 0i and 02 agree on all variables of 
key(F). In particular, 81 (m) =82 (u). It follows 0i (x) = 82 (x). 
1. Assume 9i(jc) = 9 2 (jc). Since key(F) C F + ' q , and 
since neither y nor z occurs inside F + ' q in the Venn diagram 
of Fig. [3] it is correct to conclude that 81(F) and 62(F) are 



I =>■ I By contraposition. Assume that r is a repair of db, 
such that r ^= q. Then, rU {A,} is a repair of db, 1 that fal- 
sifies q. I I By contraposition. Assume that r is a repair 
of db, 1 such that r ^= q. Obviously, r\ block(A,,db,_i) is a 
repair of db, that falsifies q. □ 

B Proofs of Section g 

Proof Lemma [2] Assume F G. Let x be a join tree for q. 

Let F o Fi . . . ^ F m _ 1 ^ G be the path in x between F and 
G (m > 1). 

We have L,„ C vars(G). Since %i{q\{F}) \= key(G) 
L,„ and L m F + ' 9 (because F G), it must be the case that 
OC(q\{F}) V= key(F) -> key(G), hence key(G) £ F+' 1 ?. 

We have Li C vars(F). Since L\ <£. F + ' q (because F G), 
it must be the case that vars(F) G\ F + l? . □ 

C Proofs of Section [5] 

Proof Lemma [4] We show that if the attack graph of q con- 
tains a strong cycle of length n with n > 3, then it contains a 
strong cycle of some length m with m < n. 

Let Ho -w Hi -w H2 . . . ~» H„_ 1 -w Ho be a strong cycle of 
length n (n > 3) in the attack graph of q, where i 7^ j implies 
H, 7^ H,. Assume without loss of generality that the attack 
H ~» Hi is strong. Thus, ^(9) ^ key(H ) -> key (Hi). 

We write i j as shorthand for for (z + j) mod «. If Hi 
Hie2, then Ho ~^ Hi -w Hi ffi2 • • • ~* H„_i Ho is a strong 
cycle of length n— 1, and the desired result holds. Assume 
next Hi 7A Hi 02- By Lemma[5] H 2 ~-» Hi. We distinguish two 
cases. 

Case H 2 ~* Hi is a strong attack. Then Hi~^> H 2 ~^ H\ is 
a strong cycle of length 2 < n. 

Case H 2 — + Hi is a weak attack. If Hi H , then 
Ha Hi ~-> Ha is a strong cycle of length 2 < n. Assume next 
H ■/+ Ho. Then, from Ho Hi H2 and Lemma [3] it fol- 
lows Ha ~> H2. The cycle Ho ~*H 2 ~> H 2 ®i . . . ~> H„_i Ho 
has length n — 1 . It suffices to show that the attack Ha ~~> H2 
is strong. Assume towards a contradiction that the attack 
H ~» H 2 is weak. Then, ^(0) |= key(H ) -> key(H 2 ). Since 



J From 
F +,<? . 



it follows 



F G and Lemma 
vars(F) G\ F + '' 7 . Since key(F) C F + ' q , we can assume 
a variable u G vars(F) \ key(F) such that u G" F +,q . From 
the premise 9i(F) = 82(F), it follows 9i(h) = 9 2 (m). Since 
y occurs in all regions outside F + - q in the Venn diagram, 
we conclude B\(y) — B 2 (y). Finally, 9i(jc) = 92 (x) follows 
from item 1 proved before. 2. 4= Assume 9i(x) 
and 9i(y) = B 2 (y). Since vars(F) C F S ' C/ , 
not occur inside F^' q in the Venn diagram, 
to conclude 81(F) = 82(F). This concludes the proof of 
Sublemma|2] H 



82 (x) 
and since z does 
it is correct 



Proof Sublemma |3 

Since F 



1. 



Obviously, key(G) C G+- q . 
G is a strong attack, key(G) ^ F Ea,<? . We can as- 
sume u G key(G) such that u G G +,q \ F ES,< '. Consequently, 
8i(h) = {Qx(y)Mz)) and 8 2 (k) = (8 2 (y),e 2 (z)). Since 
81(F) and 82(F) are key-equal by the premise, 81 and 62 
agree on all variables of key(G). In particular, B\{u) = B 2 {u). 
It follows 9i(y) = 8 2 (y) and 81 (z) = 9 2 (z). | lx 
9i(j) = 9 2 (y) and 9i(z) = 9 2 (z). Since key(G) C G 4 



Assume 
< q , and 

since x does not occur inside G +,q in the Venn diagram, it is 
correct to conclude that 9i (G) and 92(G) are key-equal. 

2. => From G F and Lemma |2j it follows 
vars(G) ? G +,q . Since key(G) C G + - q , we can assume 
a variable u G vars(G) \ key(G) such that u G + ' q . From 
the premise 9i(G) = 92(G), it follows 9i(m) = 82(H). Since 
x occurs in all regions outside G +x/ in the Venn diagram, 
we conclude B\(x) = B 2 (x). Finally, Bi(y) — B 2 (y) and 
9i(z) = 9 2 (z) follow from item 1 proved before. 
Trivial. This concludes the proof of Sublemma[3] 



Proof Sublemma [4] [T7| Assume ro is a repair of dbo. 
We first show that map(ro) ndbf contains no two distinct 
key-equal facts. Let A,B G map(ro) fldbf. We can assume 
81,82 G V such that 8i(F ),e 2 (Fo) G r , A = 81(F), and 
B = 82(F). By Sublemma pi if A and B are key-equal and 
distinct, then 81(F)) and 8 2 (Fo) are key-equal and distinct, 
contradicting that ro is a repair. We conclude by contradic- 
tion that map(ro) ndbf contains no distinct key-equal facts. 

In an analogous way, one can use Sublemma[3]to show that 
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map(ro) fl dbc contains no distinct key-equal facts. Since 
dbrest is consistent, it follows that map(ro) is consistent. 

We next show that map(ro) is a maximal consistent subset 
of db. Let A G dbf?. We need to show that map(ro) contains 
a fact that is key-equal to A. We can assume 8 G 1/ such 
that A = 8(F). Since dbo contains 8(Fq) (by definition of 
V), r contains 8'(F ) for some 8' G V with 8'(jc) = 8(jc). 
Consequently, map(ro) contains 8'(F ). By Sublemmabl A 
and 8'(F) are key-equal. We conclude that map(ro) contains 
a fact that is key-equal to A. 

In an analogous way, one can use Sublemma[3]to show that 
for every A G dbc, map(ro) contains a fact that is key-equal 
to A. 

. Let r be a repair of db. Let ro be the following subset 



sponds to constructing a maximal sequence 
X 



of db 



r = {d{F ) | 8(F)er,eeV} 
U {8(Go)|e(G)er,8e1/} 

We show r = map(ro). Since db rest C r n map(ro), it suffices 
to show r\ db r est C map(ro) and map(ro) \db res t C r. Let 
Aer\db re st. LetA = 8(F),8G V (the case whereA = 8(G) 
is analogous). Then by definition of ro, 8(F)) G ro, hence 
8(F) G map(ro). Conversely, let A G map(ro) \ db res t. Let 
A = 8(F), 8 G V (the case where A = 8(G) is analogous). 
By {3}, 8(F ) G ro. We can assume 8' G "V such that 8'(F) G r 
and 8'(F ) = 8(F ). By Sublemma[2) 8'(F ) = 8(F ) implies 
8'(F) = 8(F), hence A G r. 

Using Sublemmas[2]and[3] it is straightforward to show that 
ro is a repair of dbo- 



. Let ro,r be distinct repairs of dbo- Then there exist 
distinct key-equal facts A,B such that A G ro and B G r . 
Assume A,B are /?o-facts (the case where A,B are 5b-facts 
is analogous). There exist valuations 81,82 G V such that 
A = 81 (F ) and B = 8 2 (F ). By Sublemma |] 81(F) and 
82(F) are distinct and key-equal. Since 81(F) G map(ro) and 
82(F) G map(r ), and since map(ro), map(r ) are consistent 
by property 1 shown earlier, it follows map(ro) ^ map(r ). 
This concludes he proof of Sublemma|4] H 



D Proofs of Section [6] 



Proof Lemma [i] [77] Let x be a join tree for q. Let x' be the 
graph obtained from x by replacing each vertex H with H^^, 
and by adjusting edge labels (that is, every label L is replaced 
with L\ {z}). Clearly, x' is a join tree for q 1 ■ 

2 and 3. [ Let Q C q. LetX,F C vars(g). Let Q' = Q^ c y 
In the next paragraph, we show that < K.(Q) (= X — > Y implies 
!K(Q')hX\{z}^Y\{z}. 

The computation of the attribute closure {y | 9C(Q) \=X ~^ 
y} by means of a standard algorithm [1, page 165] corre- 



= So 
Si 



Sk-i 



where 
1. S C5i C 



5 Sk-i £ S^; and 



2. for every G {1,2, . . .,k}, 

(a) ffj G Q. Thus, !?C(2) contains the functional depen- 
dency key (Hi) — > vars (H). 

(b) key(H) C 5,_i and $ = S,_i U vars(#;)- 

Then, Sk — {y \ 2C(Q) \— X ~ > y}- Consider now the following 
sequence. 

*\{z} = S \{z} H 1{ ^ c] 
Si\{z} H 2[z ^ c ] 

Sk-i\{z} fl*[a->-c] 
*\{z} 

Clearly, for every 2' G {1,2, . . . , k}, 

1. Hi^c] ^ 2'- Thus, !?C(2') contains the functional de- 
pendency keyfffj^c]) vars(H i[zH>c ]). 

2. key(fl M ) C \ {4 and 5,- \ {z} = (S^ \ {z}) U 
vars(Hq^. c] ). 

It follows that if Ki(Q) \= X -> y and y 7^ z, then (= 
X\{z}^y. 

To prove 2, assume F 7A G. Then, the unique path in 
X between F and G contains an edge with label L such 
that %L(q\{F}) |= key(F) -> L. It follows SC(^'\{F'}) |= 
key(F') — >■ L\ {z}. Since L\ {z} is a label on the unique path 

^/ 

in x' between F and G', it follows F 7A G'. 

To prove 3, assume the attack F ~* G is weak. Then 
%i(q) |= key(F) -> key(G), hence SC(^) |= key(F') -> 

key(G'). It follows that the attack F' G', if it exists, is 
weak. □ 

Proof Lemma [6] Assume each cycle in the attack graph 
of q is terminal. Assume towards a contradiction that three 
distinct atoms F,G,H of q belong to a same cycle such that 
F G and G H. By Lemma [3] F H or G F. In 
both cases, the attack graph contains a nonterminal cycle, a 
contradiction. □ 

Proof Lemma [7] [77] Let x be a join tree for q. By 
Lemma [6] every cycle in g's attack graph is of the form 
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H ~* H 1 ~+ H. We show that if H ~» H' ~» H, then x con- 
tains an edge {H,H'}. Assume towards a contradiction that 
H ~» H' ~» H and there exists / G g such that H ^ I ^ H' 
and 7 lies on the (unique) path in x between H and H'. From 
// ~* H', it follows 7i ~* I. Then the cycle H H' H is 
not terminal, a contradiction. 

Assume variable x occurs in two distinct cycles of q's at- 
tack graph. We can assume disjoint cycles F ~-> F' F and 
G-^> G' G such thatx G vars(F) U vars(F') andx G vars(G). 
Assume towards a contradiction x ^ key(F). By the Con- 
nectedness Condition, if e is an edge on the unique path in 
x between F and G and e ^ {F,F'}, then the edge label 
of e contains x. Since the cycle F F' F is terminal, 
F /> G. It follows jo G F +,<? . Since x £ key(F), we can as- 
sume an atom H E q such that H ^ F and key(H) C key(F). 
From F F' and Lemma [2] it follows // 7^ F'. By the 
premise of the lemma, we can assume H' G q \ {F,F'} such 
that H —* H' //. By the Connectedness Condition, every 
edge label on the path in x between F and // contains key(H). 

Let H ^ I\ ^ I2 ■ . ■ ^ I,„-i ^ / m with /„, = F be the unique 
path in x between H and F. Two cases can occur. 

Case h = H'. From H' -f+ h, it follows L 2 C Since 
key(#) C L 2 , key(H) C 

Case 7i 7^ //'. Since H' ~+ H and H' I u it follows Li C 
H' + ' q . Since key(//) C Li, key (if) C //'+' <? . 

Hence, key(tf) C //' + ' <? . Then by Lemma[2] H' H, a con- 
tradiction. We conclude by contradictions G key(F). 

2. Assume F G is a weak attack. Let 
x G key(G) \ key(F). By property 1 proved above, for 
all H G q \ {F,G}, x g vars(tf). Since the attack F G 
is weak, %i(q) (= key(F) — > x. Since x does not occur in 
g , \ {F,G}, it must be the case that x G vars(F). It follows 
key(G) C vars(F). □ 



• if P £ CERTAINTY(^,), then ri contains a repair of P 
falsifying qc and 

• if P G CERTAINTY^,), then r 2 contains a repair of P. 

Since db G CERTAINTY(g) by the premise, we have r |= 
q. For all i € {1, . . . ,£}, for every valuation 0, if 8(g,) C r, 
then 8(F), 8(G,) must belong to the same partition of db,, 
hence 0(F,-),0(G;) C r?. Consequently, r 2 \= q. Since r 2 Q 
Ui<i<£ ILdb/J, we have Ui<j<*lI«H»d] |= 4. 

2 ==> 1 Let r be a repair of db. Let x be a sequence of 



Proof Lemma |8] Assume that F is an /?-atom. Since db 
is purified relative to q, we can assume that the set of R- 
facts of db is = {F^ b]] , . . . , } f or some £ > and 

Si,...,fe< G £> l?| . Since key(F) = 0, all facts of %_ are key- 
equal. 

1 ==> 2 Since db |= g, we have I > 1, hence db 7^ 0. Let 
r be a repair of db and i G {1, . . . We need to show that 
r h 4 Since (r\^)U {^.g.j} is a repair of db, it Proof Proposition [j] 
follows from the premise that (r \ H{) U {F^g j } |= q. Since 
g contains no self-join, it follows (r\^J |= q'^^.y hence 



distinct variables containing every variable x such that for 
some 1 < i < j < £, x G vars(g,) n vars(g ; ). By the premise, 

we can assume a G such that Ui<j<^li^iJJ N ^[5h>3]- 
Let 8 be the valuation over vars(x) such that 8(x) = a. For 
1 < i < I, vars(x,) C vars(x). For 1 < i < I , define a; G D'^'l 
as the sequence of constants such that a, = 8(x,). Then, for 
each i £{!,...,£}, |[db,-JJ (=9iR^3,]- Since li db ''JI contains 
a partition with vector 3, which belongs to CERTAINTY(g,-) 
(by construction), every repair of [[db,]] satisfies qi^^.y 
Since r contains a repair of [[db,-]], we conclude r |= qi^^^y 
Since x contains all variables that occur in two distinct 
queries among qi,...,qi, it is correct to conclude r |= qy^a] ■ 
Since r is an arbitrary repair of db, db G CERTAINTY (q). H 

Proof Lemma |9] Let T> be the set of uncertain databases. 
We define a mapping / : © — > T> as follows. If db is an 
uncertain database with active domain D, then /(db) is the 
smallest set such that 

• for every fact A G db, if the relation name of A occurs in 
q', then A G /(db); and 

• whenever R(x) in q \ q' and a G , then /(db) contains 
R(a). 

It is straightforward that / is first-order expressible and 
db G CERTAINTY^') /(db) G CERTAINTY^). □ 

Proof Corollary [I] By Lemma |9| there exists an 
AC many-one reduction from CERTAINTY(C(£)) 
to CERTAINTY(AC(/fc)). Then by Theorem Q 
CERTAINTY(C(&)) is in P. □ 



E Proofs of Section g 



1 



Let r be a repair of db. By the premise, for every 
j G {1, . . .,£}, r |= q'fy+i y We can assume i G {!,...,£} 
such that Fj^g j G r. From r |= q'a^.y it follows r\—q. □ 



■ 2 Let w G worlds(db) such that 
Pr(w) > 0. We need to show w \= q. From Pr(w) > 0, it 
follows that for every block b of db, if Y,Aeb Pr(A) = 1, then 
w contains a fact of b. Consequently, there exists a repair r of 
db' such that r C w. Since r |= q by the premise, it follows 
W \=q. 

1 Let r be a repair of db', hence r G worlds(db). 



Proof Sublemma|i] 



Let r = ri U r 2 be a repair of 



We have Pr(r) > 0, because for every block b of db, if 
L^eb Pr(A) = 1, then r contains a fact of b. By the premise, 
r^q. ' " □ 



db such that for all i G {!,...,£}, for all partitions P of db, 
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Proof Corollary [I] Assume CERTAINTY (q) is not 

first-order expressible. By Theorem[6| query q is not safe. By 
Theorem|5] PROBABILITY^?) is tjP-hard. □ 
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